OcrV1, France, Analysis, bibRecord, 000085

Labelling logical structures of document images using a dynamic perceptive neural network

Identifieur interne : 000085 ( France/Analysis ); précédent : 000084; suivant : 000086

Labelling logical structures of document images using a dynamic perceptive neural network

Auteurs : Yves Rangoni [France] ; Abdet Belaïd [France] ; Szilárd Vajda [Allemagne]

Source :

International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2012.

RBID : Pascal:12-0415345

Descripteurs français

Pascal (Inist)
- Etiquetage, Système dynamique, Reconnaissance caractère, Reconnaissance optique caractère, Texte, Classification, Analyse documentaire, Analyse image, Reconnaissance image, Traitement image, Structure document, Présentation document, Perception sensorielle, Temps occupation, Taux erreur, Réseau neuronal, Modèle dynamique, Modélisation, Temps retard, Système à retard, Segmentation, ..
Wicri :
- topic : Classification.

English descriptors

KwdEn :
- Character recognition, Classification, Delay system, Delay time, Document analysis, Document layout, Document structure, Dynamic model, Dynamical system, Error rate, Image analysis, Image processing, Image recognition, Labelling, Modeling, Neural network, Occupation time, Optical character recognition, Segmentation, Sensorial perception, Text.

Abstract

This paper proposes a new method for labelling the logical structures of document images. The system starts with digitised images of paper documents, performs a physical layout analysis, runs an OCR and finally exploits the OCR's outputs to find the meaning of each block of text (i.e. assigns labels like "Title", "Author", etc.). The method is an extension of our previous work where a classifier, the perceptive neural network, has been developed to be an analogy of the human perception. We introduce in this connectionist model a temporal dimension by the use of a time-delay neural network with local representation. During the recognition stage, the system performs several recognition cycles and corrections, while keeping track and reusing the previous outputs. This dynamic classifier allows then a better handling of noise and segmentation errors. The experiments have been carried out on two datasets: the public MARG containing more than 1,500 front pages of scientific papers with four zones of interest and another one composed of documents from the Siggraph 2003 conference, where 21 logical structures have been identified. The error rate on MARG is less than 2.5% and 7.3% on the Siggraph dataset.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000077
to stream PascalFrancis, to step Curation: 000695
to stream PascalFrancis, to step Checkpoint: 000072
to stream Main, to step Merge: 000313
to stream Main, to step Curation: 000310
to stream Main, to step Exploration: 000310
to stream France, to step Extraction: 000085

Links to Exploration step

Pascal:12-0415345

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Labelling logical structures of document images using a dynamic perceptive neural network</title>
<author><name sortKey="Rangoni, Yves" sort="Rangoni, Yves" uniqKey="Rangoni Y" first="Yves" last="Rangoni">Yves Rangoni</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Nancy 2 University, LORIA</s1>
<s2>Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><settlement type="city">Vandœuvre-lès-Nancy</settlement>
<settlement type="city" wicri:auto="agglo">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Belaid, Abdet" sort="Belaid, Abdet" uniqKey="Belaid A" first="Abdet" last="Belaïd">Abdet Belaïd</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Nancy 2 University, LORIA</s1>
<s2>Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><settlement type="city">Vandœuvre-lès-Nancy</settlement>
<settlement type="city" wicri:auto="agglo">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Vajda, Szilard" sort="Vajda, Szilard" uniqKey="Vajda S" first="Szilárd" last="Vajda">Szilárd Vajda</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Computer Science Department</s1>
<s2>TU Dortmund, Dortmund</s2>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District d'Arnsberg</region>
<settlement type="city">Dortmund</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">12-0415345</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 12-0415345 INIST</idno>
<idno type="RBID">Pascal:12-0415345</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000077</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000695</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000072</idno>
<idno type="wicri:doubleKey">1433-2833:2012:Rangoni Y:labelling:logical:structures</idno>
<idno type="wicri:Area/Main/Merge">000313</idno>
<idno type="wicri:Area/Main/Curation">000310</idno>
<idno type="wicri:Area/Main/Exploration">000310</idno>
<idno type="wicri:Area/France/Extraction">000085</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Labelling logical structures of document images using a dynamic perceptive neural network</title>
<author><name sortKey="Rangoni, Yves" sort="Rangoni, Yves" uniqKey="Rangoni Y" first="Yves" last="Rangoni">Yves Rangoni</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Nancy 2 University, LORIA</s1>
<s2>Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><settlement type="city">Vandœuvre-lès-Nancy</settlement>
<settlement type="city" wicri:auto="agglo">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Belaid, Abdet" sort="Belaid, Abdet" uniqKey="Belaid A" first="Abdet" last="Belaïd">Abdet Belaïd</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Nancy 2 University, LORIA</s1>
<s2>Vandœuvre-Lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><settlement type="city">Vandœuvre-lès-Nancy</settlement>
<settlement type="city" wicri:auto="agglo">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Vajda, Szilard" sort="Vajda, Szilard" uniqKey="Vajda S" first="Szilárd" last="Vajda">Szilárd Vajda</name>
<affiliation wicri:level="3"><inist:fA14 i1="02"><s1>Computer Science Department</s1>
<s2>TU Dortmund, Dortmund</s2>
<s3>DEU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName><region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District d'Arnsberg</region>
<settlement type="city">Dortmund</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Classification</term>
<term>Delay system</term>
<term>Delay time</term>
<term>Document analysis</term>
<term>Document layout</term>
<term>Document structure</term>
<term>Dynamic model</term>
<term>Dynamical system</term>
<term>Error rate</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Labelling</term>
<term>Modeling</term>
<term>Neural network</term>
<term>Occupation time</term>
<term>Optical character recognition</term>
<term>Segmentation</term>
<term>Sensorial perception</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Etiquetage</term>
<term>Système dynamique</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Texte</term>
<term>Classification</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Reconnaissance image</term>
<term>Traitement image</term>
<term>Structure document</term>
<term>Présentation document</term>
<term>Perception sensorielle</term>
<term>Temps occupation</term>
<term>Taux erreur</term>
<term>Réseau neuronal</term>
<term>Modèle dynamique</term>
<term>Modélisation</term>
<term>Temps retard</term>
<term>Système à retard</term>
<term>Segmentation</term>
<term>.</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Classification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper proposes a new method for labelling the logical structures of document images. The system starts with digitised images of paper documents, performs a physical layout analysis, runs an OCR and finally exploits the OCR's outputs to find the meaning of each block of text (i.e. assigns labels like "Title", "Author", etc.). The method is an extension of our previous work where a classifier, the perceptive neural network, has been developed to be an analogy of the human perception. We introduce in this connectionist model a temporal dimension by the use of a time-delay neural network with local representation. During the recognition stage, the system performs several recognition cycles and corrections, while keeping track and reusing the previous outputs. This dynamic classifier allows then a better handling of noise and segmentation errors. The experiments have been carried out on two datasets: the public MARG containing more than 1,500 front pages of scientific papers with four zones of interest and another one composed of documents from the Siggraph 2003 conference, where 21 logical structures have been identified. The error rate on MARG is less than 2.5% and 7.3% on the Siggraph dataset.</div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
<li>France</li>
</country>
<region><li>District d'Arnsberg</li>
<li>Rhénanie-du-Nord-Westphalie</li>
</region>
<settlement><li>Dortmund</li>
<li>Nancy</li>
<li>Vandœuvre-lès-Nancy</li>
</settlement>
</list>
<tree><country name="France"><noRegion><name sortKey="Rangoni, Yves" sort="Rangoni, Yves" uniqKey="Rangoni Y" first="Yves" last="Rangoni">Yves Rangoni</name>
</noRegion>
<name sortKey="Belaid, Abdet" sort="Belaid, Abdet" uniqKey="Belaid A" first="Abdet" last="Belaïd">Abdet Belaïd</name>
</country>
<country name="Allemagne"><region name="Rhénanie-du-Nord-Westphalie"><name sortKey="Vajda, Szilard" sort="Vajda, Szilard" uniqKey="Vajda S" first="Szilárd" last="Vajda">Szilárd Vajda</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Analysis

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000085 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000085 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     Pascal:12-0415345
   |texte=   Labelling logical structures of document images using a dynamic perceptive neural network
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Labelling logical structures of document images using a dynamic perceptive neural network

Labelling logical structures of document images using a dynamic perceptive neural network

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri